[Chapter Six][Previous]
[Next] [Art of
Assembly][Randall Hyde]
Art of Assembly: Chapter Six
- 6.9 - Program Flow Control Instructions
- 6.9.1 - Unconditional Jumps
- 6.9.2 - The CALL and RET Instructions
- 6.9.3 - The INT, INTO, BOUND, and IRET
Instructions
- 6.9.4 - The Conditional Jump Instructions
- 6.9.5 - The JCXZ/JECXZ Instructions
- 6.9.6 - The LOOP Instruction
- 6.9.7 - The LOOPE/LOOPZ Instruction
- 6.9.8 - The LOOPNE/LOOPNZ Instruction
- 6.10 - Miscellaneous Instructions
6.9 Program Flow Control Instructions
The instructions discussed thus far execute sequentially; that is, the
CPU executes each instruction in the sequence it appears in your program.
To write real programs requires several control structures, not just the
sequence. Examples include the if
statement, loops, and subroutine
invocation (a call). Since compilers reduce all other languages to assembly
language, it should come as no surprise that assembly language supports
the instructions necessary to implement these control structures. 80x86
program control instructions belong to three groups: unconditional transfers,
conditional transfers, and subroutine call and return instructions. The
following sections describe these instructions:
6.9.1 Unconditional Jumps
The jmp
(jump) instruction unconditionally transfers control
to another point in the program. There are six forms of this instruction:
an intersegment/direct jump, two intrasegment/direct jumps, an intersegment/indirect
jump, and two intrasegment/indirect jumps. Intrasegment jumps are always
between statements in the same code segment. Intersegment jumps can transfer
control to a statement in a different code segment.
These instructions generally use the same syntax, it is
jmp target
The assembler differentiates them by their operands:
jmp disp8 ;direct intrasegment, 8 bit displacement.
jmp disp16 ;direct intrasegment, 16 bit displacement.
jmp adrs32 ;direct intersegment, 32 bit segmented address.
jmp mem16 ;indirect intrasegment, 16 bit memory operand.
jmp reg16 ;register indirect intrasegment.
jmp mem32 ;indirect intersegment, 32 bit memory operand.
Intersegment is a synonym for far, intrasegment is a synonym for near.
The two direct intrasegment jumps differ only in their length. The first
form consists of an opcode and a single byte displacement. The CPU sign
extends this displacement to 16 bits and adds it to the ip
register. This instruction can branch to a location -128..+127 from the
beginning of the next instruction following it (i.e., -126..+129 bytes around
the current instruction).
The second form of the intrasegment jump is three bytes long with a two
byte displacement. This instruction allows an effective range of -32,768..+32,767
bytes and can transfer control to anywhere in the current code segment.
The CPU simply adds the two byte displacement to the ip
register.
These first two jumps use a relative addressing scheme. The offset encoded
as part of the opcode byte is not the target address in the current code
segment, but the distance to the target address. Fortunately, MASM will
compute the distance for you automatically, so you do not have to compute
this displacement value yourself. In many respects, these instructions are
really nothing more than add ip, disp
instructions.
The direct intersegment jump is five bytes long, the last four bytes containing
a segmented address (the offset in the second and third bytes, the segment
in the fourth and fifth bytes). This instruction copies the offset into
the ip
register and the segment into the cs
register.
Execution of the next instruction continues at the new address in cs:ip
.
Unlike the previous two jumps, the address following the opcode is the absolute
memory address of the target instruction; this version does not use relative
addressing. This instruction loads cs:ip with a 32 bit immediate value.
For the three direct jumps described above, you normally specify the target
address using a statement label. A statement label is usually an identifier
followed by a colon, usually on the same line as an executable machine instruction.
The assembler determines the offset of the statement after the label and
automatically computes the distance from the jump instruction to the statement
label. Therefore, you do not have to worry about computing displacements
manually. For example, the following short little loop continuously reads
the parallel printer data port and inverts the L.O. bit. This produces a
square wave electrical signal on one of the printer port output lines:
mov dx, 378h ;Parallel printer port address.
LoopForever: in al, dx ;Read character from input port.
xor al, 1 ;Invert the L.O. bit.
out dx, al ;Output data back to port.
jmp LoopForever ;Repeat forever.
The fourth form of the unconditional jump instruction is the indirect intrasegment
jump instruction. It requires a 16 bit memory operand. This form transfers
control to the address within the offset given by the two bytes of the memory
operand. For example,
WordVar word TargetAddress
.
.
.
jmp WordVar
transfers control to the address specified by the value in the 16 bit memory
location WordVar
. This does not jump to the statement at address
WordVar
, it jumps to the statement at the address held in the
WordVar
variable. Note that this form of the jmp instruction
is roughly equivalent to:
mov ip, WordVar
Although the example above uses a single word variable containing the indirect
address, you can use any valid memory address mode, not just the displacement
only addressing mode. You can use memory indirect addressing modes like
the following:
jmp DispOnly ;Word variable
jmp Disp[bx] ;Disp is an array of words
jmp Disp[bx][si]
jmp [bx]
etc.
Consider the indexed addressing mode above for a moment (disp[bx]
).
This addressing mode fetches the word from location disp+bx
and copies this value to the ip
register; this lets you create
an array of pointers and jump to a specified pointer using an array index.
Consider the following example:
AdrsArray word stmt1, stmt2, stmt3, stmt4
.
.
.
mov bx, I ;I is in the range 0..3
add bx, bx ;Index into an array of words.
jmp AdrsArray[bx] ;Jump to stmt1, stmt2, etc., depending
; on the value of I.
The important thing to remember is that the near indirect jump fetches a
word from memory and copies it into the ip
register; it does
not jump to the memory location specified, it jumps indirectly through the
16 bit pointer at the specified memory location.
The fifth jmp
instruction transfers control to the offset given
in a 16 bit general purpose register. Note that you can use any general
purpose register, not just bx
, si
, di
,
or bp
. An instruction of the form
jmp ax
is roughly equivalent to
mov ip, ax
Note that the previous two forms (register or memory indirect) are really
the same instruction. The mod and r/m fields of a mod-reg-r/m byte specify
a register or memory indirect address. See Appendix D for the details.
The sixth form of the jmp
instruction, the indirect intersegment
jump, has a memory operand that contains a double word pointer. The CPU
copies the double word at that address into the cs:ip
register
pair. For example,
FarPointer dword TargetAddress
.
.
.
jmp FarPointer
transfers control to the segmented address specified by the four bytes at
address FarPointer
. This instruction is semantically identical
to the (mythical) instruction
lcs ip, FarPointer ;load cs, ip from FarPointer
As for the near indirect jump described earlier, this far indirect jump
lets you specify any arbitrary (valid) memory addressing mode. You are not
limited to the displacement only addressing mode the example above uses.
MASM uses a near indirect or far indirect addressing mode depending upon
the type of the memory location you specify. If the variable you specify
is a word variable, MASM will automatically generate a near indirect jump;
if the variable is a dword, MASM emits the opcode for a far indirect jump.
Some forms of memory addressing, unfortunately, do not intrinsically specify
a size. For example, [bx]
is definitely a memory operand, but
does bx
point at a word variable or a double word variable?
It could point at either. Therefore, MASM will reject a statement of the
form:
jmp [bx]
MASM cannot tell whether this should be a near indirect or far indirect
jump. To resolve the ambiguity, you will need to use a type coercion operator.
Chapter Eight will fully describe type coercion operators, for now, just
use one of the following two instructions for a near or far jump, respectively:
jmp word ptr [bx]
jmp dword ptr [bx]
The register indirect addressing modes are not the only ones that could
be type ambiguous. You could also run into this problem with indexed and
base plus index addressing modes:
jmp word ptr 5[bx]
jmp dword ptr 9[bx][si]
For more information on the type coercion operators, see Chapter Eight.
In theory, you could use the indirect jump instructions and the setcc
instructions to conditionally transfer control to some given location. For
example, the following code transfers control to iftrue
if
word variable X
is equal to word variable Y
. It
transfers control to iffalse
, otherwise.
JmpTbl word iffalse, iftrue
.
.
.
mov ax, X
cmp ax, Y
sete bl
movzx ebx, bl
jmp JmpTbl[ebx*2]
As you will soon see, there is a much better way to do this using the conditional
jump instructions.
6.9.2 The CALL and RET Instructions
The call
and ret
instructions handle subroutine
calls and returns. There are five different call instructions and six different
forms of the return instruction:
call disp16 ;direct intrasegment, 16 bit relative.
call adrs32 ;direct intersegment, 32 bit segmented address.
call mem16 ;indirect intrasegment, 16 bit memory pointer.
call reg16 ;indirect intrasegment, 16 bit register pointer.
call mem32 ;indirect intersegment, 32 bit memory pointer.
ret ;near or far return
retn ;near return
retf ;far return
ret disp ;near or far return and pop
retn disp ;near return and pop
retf disp ;far return and pop
The call
instructions take the same forms as the jmp
instructions except there is no short (two byte) intrasegment call.
The far call
instruction does the following:
- It pushes the
cs
register onto the stack.
- It pushes the 16 bit offset of the next instruction following the call
onto the stack.
- It copies the 32 bit effective address into the
cs:ip
registers.
Since the call
instruction allows the same addressing modes
as jmp
, call
can obtain the target address using
a relative, memory, or register addressing mode.
- Execution continues at the first instruction of the subroutine. This
first instruction is the opcode at the target address computed in the previous
step.
The near call
instruction does the following:
- It pushes the 16 bit offset of the next instruction following the call
onto the stack.
- It copies the 16 bit effective address into the
ip
register.
Since the call
instruction allows the same addressing modes
as jmp
, call
can obtain the target address using
a relative, memory, or register addressing mode.
- Execution continues at the first instruction of the subroutine. This
first instruction is the opcode at the target address computed in the previous
step.
The call disp16
instruction uses relative addressing. You can
compute the effective address of the target by adding this 16 bit displacement
with the return address (like the relative jmp instructions, the displacement
is the distance from the instruction following the call to the target address).
The call adrs32
instruction uses the direct addressing mode.
A 32 bit segmented address immediately follows the call
opcode.
This form of the call instruction copies that value directly into the cs:ip
register pair. In many respects, this is equivalent to the immediate addressing
mode since the value this instruction copies into the cs:ip
register
pair immediately follows the instruction.
Call mem16
uses the memory indirect addressing mode. Like the
jmp
instruction, this form of the call
instruction
fetches the word at the specified memory location and uses that word's value
as the target address. Remember, you can use any memory addressing mode
with this instruction. The displacement-only addressing mode is the most
common form, but the others are just as valid:
call CallTbl[bx] ;Index into an array of pointers.
call word ptr [bx] ;BX points at word to use.
call WordTbl[bx][si] ; etc.
Note that the selection of addressing mode only affects the effective address
computation for the target subroutine. These call instructions still push
the offset of the next instruction following the call onto the stack. Since
these are near calls (they obtain their target address from a 16 bit memory
location), they all push a 16 bit return address onto the stack.
Call reg16
works just like the memory indirect call above,
except it uses the 16 bit value in a register for the target address. This
instruction is really the same instruction as the call mem16
instruction. Both forms specify their effective address using a mod-reg-r/m
byte. For the call reg16
form, the mod bits contain 11b so
the r/m field specifies a register rather than a memory addressing mode.
Of course, this instruction also pushes the 16 bit offset of the next instruction
onto the stack as the return address.
The call mem32
instruction is a far indirect call. The memory
address specified by this instruction must be a double word value. This
form of the call instruction fetches the 32 bit segmented address at the
computed effective address and copies this double word value into the cs:ip
register pair. This instruction also copies the 32 bit segmented address
of the next instruction onto the stack (it pushes the segment value first
and the offset portion second). Like the call mem16
instruction,
you can use any valid memory addressing mode with this instruction:
call DWordVar
call DwordTbl[bx]
call dword ptr [bx]
etc.
It is relatively easy to synthesize the call
instruction using
two or three other 80x86 instructions. You could create the equivalent of
a near call
using a push
and a jmp
instruction:
push <offset of instruction after jmp>
jmp subroutine
A far call
would be similar, you'd need to add a push
cs
instruction before the two instructions above to push a far return
address on the stack.
The ret
(return) instruction returns control to the caller
of a subroutine. It does so by popping the return address off the stack
and transferring control to the instruction at this return address. Intrasegment
(near) returns pop a 16 bit return address off the stack into the ip
register. An intersegment (far) return pops a 16 bit offset into the ip
register and then a 16 bit segment value into the cs
register.
These instructions are effectively equal to the following:
retn: pop ip
retf: popd cs:ip
Clearly, you must match a near subroutine call with a near return and a
far subroutine call with a corresponding far return. If you mix near calls
with far returns or vice versa, you will leave the stack in an inconsistent
state and you probably will not return to the proper instruction after the
call. Of course, another important issue when using the call
and ret
instructions is that you must make sure your subroutine
doesn't push something onto the stack and then fail to pop it off before
trying to return to the caller. Stack problems are a major cause of errors
in assembly language subroutines. Consider the following code:
Subroutine: push ax
push bx
.
.
.
pop bx
ret
.
.
.
call Subroutine
The call
instruction pushes the return address onto the stack
and then transfers control to the first instruction of subroutine
.
The first two push instructions push the ax
and bx
registers onto the stack, presumably in order to preserve their value because
subroutine
modifies them. Unfortunately, a programming error
exists in the code above, subroutine only pops bx
from the
stack, it fails to pop ax
as well. This means that when subroutine
tries to return to the caller, the value of ax
rather than
the return address is sitting on the top of the stack. Therefore, this subroutine
returns control to the address specified by the initial value of the ax
register rather than to the true return address. Since there are 65,536
different values ax
can have, there is a 1/65,536th of a chance
that your code will return to the real return address. The odds are not
in your favor! Most likely, code like this will hang up the machine. Moral
of the story - always make sure the return address is sitting on the stack
before executing the return instruction.
Like the call
instruction, it is very easy to simulate the
ret instruction using two 80x86 instructions. All you need to do is pop
the return address off the stack and then copy it into the ip
register. For near returns, this is a very simple operation, just pop the
near return address off the stack and then jump indirectly through that
register:
pop ax
jmp ax
Simulating a far return is a little more difficult because you must load
cs:ip
in a single operation. The only instruction that does
this (other than a far return) is the jmp mem32
instruction.
See the exercises at the end of this chapter for more details.
There are two other forms of the ret
instruction. They are
identical to those above except a 16 bit displacement follows their opcodes.
The CPU adds this value to the stack pointer immediately after popping the
return address from the stack. This mechanism removes parameters pushed
onto the stack before returning to the caller. See Chapter Eleven for more
details.
The assembler allows you to type ret
without the "f"
or "n" suffix. If you do so, the assembler will figure out whether
it should generate a near return or a far return. See the chapter on procedures
and functions for details on this.
6.9.3 The INT, INTO, BOUND, and IRET Instructions
The int
(for software interrupt) instruction is a very special
form of a call
instruction. Whereas the call
instruction
calls subroutines within your program, the int
instruction
calls system routines and other special subroutines. The major difference
between interrupt service routines and standard procedures is that you can
have any number of different procedures in an assembly language program,
while the system supports a maximum of 256 different interrupt service routines.
A program calls a subroutine by specifying the address of that subroutine;
it calls an interrupt service routine by specifying the interrupt number
for that particular interrupt service routine. This chapter will only describe
how to call an interrupt service routine using the int, into,
and
bound
instructions, and how to return from an interrupt service
routine using the iret
instruction.
There are four different forms of the int
instruction. The
first form is
int nn
(where "nn" is a value between 0 and 255). It allows you to call
one of 256 different interrupt routines. This form of the int
instruction is two bytes long. The first byte is the int
opcode.
The second byte is immediate data containing the interrupt number.
Although you can use the int
instruction to call procedures
(interrupt service routines) you've written, the primary purpose of this
instruction is to make a system call. A system call is a subroutine call
to a procedure provided by the system, such as a DOS , PC-BIOS, mouse, or
some other piece of software resident in the machine before your program
began execution. Since you always refer to a specific system call by its
interrupt number, rather than its address, your program does not need to
know the actual address of the subroutine in memory. The int instruction
provides dynamic linking to your program. The CPU determines the actual
address of an interrupt service routine at run time by looking up the address
in an interrupt vector table. This allows the authors of such system routines
to change their code (including the entry point) without fear of breaking
any older programs that call their interrupt service routines. As long as
the system call uses the same interrupt number, the CPU will automatically
call the interrupt service routine at its new address.
The only problem with the int
instruction is that it supports
only 256 different interrupt service routines. MS-DOS alone supports well
over 100 different calls. BIOS and other system utilities provide thousands
more. This is above and beyond all the interrupts reserved by Intel for
hardware interrupts and traps. The common solution most of the system calls
use is to employ a single interrupt number for a given class of calls and
then pass a function number in one of the 80x86 registers (typically the
ah
register). For example, MS-DOS uses only a single interrupt
number, 21h. To choose a particular DOS function, you load a DOS function
code into the ah
register before executing the int 21h
instruction. For example, to terminate a program and return control
to MS-DOS, you would normally load ah with 4Ch and call DOS with the int
21h
instruction:
mov ah, 4ch ;DOS terminate opcode.
int 21h ;DOS call
The BIOS keyboard interrupt is another good example. Interrupt 16h is responsible
for testing the keyboard and reading data from the keyboard. This BIOS routine
provides several calls to read a character and scan code from the keyboard,
see if any keys are available in the system type ahead buffer, check the
status of the keyboard modifier flags, and so on. To choose a particular
operation, you load the function number into the ah register before executing
int 16h
. The following table lists the possible functions:
BIOS Keyboard Support Functions
Function #
(AH) | Input
Parameters | Output
Parameters | Description |
---|
0 | - | al - ASCII character
ah - scan code | Read character. Reads next available character from the system's type ahead buffer. Wait for a keystroke if the buffer is empty. |
1 | - | ZF- Set if no key.
ZF- Clear if key available.
al - ASCII code
ah - scan code | Checks to see if a character is available in the type ahead buffer. Sets the zero flag if not key is available, clears the zero flag if a key is available. If there is an available key, this function returns the ASCII and scan code value in ax . The value in ax is undefined if no key is available. |
2 | - | al- shift flags | Returns the current status of the shift flags in al. The shift flags are defined as follows:
bit 7: Insert toggle
bit 6: Capslock toggle
bit 5: Numlock toggle
bit 4: Scroll lock toggle
bit 3: Alt key is down
bit 2: Ctrl key is down
bit 1: Left shift key is down
bit 0: Right shift key is down |
3 | al = 5
bh = 0, 1, 2, 3 for 1/4, 1/2, 3/4, or 1 second delay
bl = 0..1Fh for 30/sec to 2/sec. | - | Set auto repeat rate. The bh register contains the amount of time to wait before starting the autorepeat operation, the bl register contains the autorepeat rate. |
5 | ch = scan code
cl = ASCII code | - | Store keycode in buffer. This function stores the value in the cx register at the end of the type ahead buffer. Note that the scan code in ch doesn't have to correspond to the ASCII code appearing in cl . This routine will simply insert the data you provide into the system type ahead buffer. |
10h | - | al - ASCII character
ah - scan code | Read extended character. Like ah =0 call, except this one passes all key codes, the ah =0 call throws away codes that are not PC/XT compatible. |
11h | - | ZF- Set if no key.
ZF- Clear if key available.
al - ASCII code
ah - scan code | Like the ah=01h call except this one does not throw away keycodes that are not PC/XT compatible (i.e., the extra keys found on the 101 key keyboard). |
12h | - | al- shift flags
ah- extended shift flags | Returns the current status of the shift flags in ax. The shift flags are defined as follows:
bit 15: SysReq key pressed
bit 14: Capslock key currently down
bit 13: Numlock key currently down
bit 12: Scroll lock key currently down
bit 11: Right alt key is down
bit 10:Right ctrl key is down
bit 9: Left alt key is down
bit 8: Left ctrl key is down
bit 7: Insert toggle
bit 6: Capslock toggle
bit 5: Numlock toggle
bit 4: Scroll lock toggle
bit 3: Either alt key is down (some machines, left only)
bit 2: Either ctrl key is down
bit 1: Left shift key is down
bit 0: Right shift key is down |
For example, to read a character from the system type ahead buffer, leaving
the ASCII code in al
, you could use the following code:
mov ah, 0 ;Wait for key available, and then
int 16h ; read that key.
mov character, al ;Save character read.
Likewise, if you wanted to test the type ahead buffer to see if a key is
available, without reading that keystroke, you could use the following code:
mov ah, 1 ;Test to see if key is available.
int 16h ;Sets the zero flag if a key is not
; available.
The second form of the int instruction is a special case:
int 3
Int
3 is a special form of the interrupt instruction that is
only one byte long. CodeView and other debuggers use it as a software breakpoint
instruction. Whenever you set a breakpoint on an instruction in your program,
the debugger will typically replace the first byte of the instruction's
opcode with an int 3
instruction. When your program executes
the int 3
instruction, this makes a "system call"
to the debugger so the debugger can regain control of the CPU. When this
happens, the debugger will replace the int 3
instruction with
the original opcode.
While operating inside a debugger, you can explicitly use the int
3
instruction to stop program executing and return control to the
debugger. This is not, however, the normal way to terminate a program. If
you attempt to execute an int 3
instruction while running under
DOS, rather than under the control of a debugger program, you will likely
crash the system.
The third form of the int
instruction is into
.
Into
will cause a software breakpoint if the 80x86 overflow
flag is set. You can use this instruction to quickly test for arithmetic
overflow after executing an arithmetic instruction. Semantically, this instruction
is equivalent to
if overflow = 1 then int 4
You should not use this instruction unless you've supplied a corresponding
trap handler (interrupt service routine). Doing so would probably crash
the system. .
The fourth software interrupt, provided by 80286 and later processors, is
the bound
instruction. This instruction takes the form
bound reg, mem
and executes the following algorithm:
if (reg < [mem]) or (reg > [mem+sizeof(reg)]) then int 5
[mem]
denotes the contents of the memory location mem
and sizeof(reg)
is two or four depending on whether the register
is 16 or 32 bits wide. The memory operand must be twice the size of the
register operand. The bound
instruction compares the values
using a signed integer comparison.
Intel's designers added the bound instruction to allow a quick check of
the range of a value in a register. This is useful in Pascal, for example,
which checking array bounds validity and when checking to see if a subrange
integer is within an allowable range. There are two problems with this instruction,
however. On 80486 and Pentium/586 processors, the bound instruction is generally
slower than the sequence of instructions it would replace:
cmp reg, LowerBound
jl OutOfBounds
cmp reg, UpperBound
jg OutOfBounds
On the 80486 and Pentium/586 chips, the sequence above only requires four
clock cycles assuming you can use the immediate addressing mode and the
branches are not taken; the bound
instruction requires 7-8
clock cycles under similar circumstances and also assuming the memory operands
are in the cache.
A second problem with the bound
instruction is that it executes
an int 5
if the specified register is out of range. IBM, in
their infinite wisdom, decided to use the int 5
interrupt handler
routine to print the screen. Therefore, if you execute a bound
instruction and the value is out of range, the system will, by default,
print a copy of the screen to the printer. If you replace the default int
5
handler with one of your own, pressing the PrtSc key will transfer
control to your bound
instruction handler. Although there are
ways around this problem, most people don't bother since the bound
instruction is so slow.
Whatever int
instruction you execute, the following sequence
of events follows:
- The 80x86 pushes the flags register onto the stack;
- The 80x86 pushes
cs
and then ip
onto the stack;
- The 80x86 uses the interrupt number (
into
is interrupt
#4, bound
is interrupt #5) times four as an index into the
interrupt vector table and copies the double word at that point in the table
into cs:ip
.
The int
instructions vary from a call
in two major
ways. First, call
instructions vary in length from two to six
bytes long, whereas int
instructions are generally two bytes
long (int 3, into
, and bound
are the exceptions).
Second, and most important, the int
instruction pushes the
flags and the return address onto the stack while the call
instruction pushes only the return address. Note also that the int
instructions always push a far return address (i.e., a cs
value
and an offset within the code segment), only the far call pushes
this double word return address.
Since int
pushes the flags onto the stack you must use a special
return instruction, iret
(interrupt return), to return from
a routine called via the int
instructions. If you return from
an interrupt procedure using the ret
instruction, the flags
will be left on the stack upon returning to the caller. The iret
instruction is equivalent to the two instruction sequence: ret
,
popf (assuming, of course, that you execute popf
before returning
control to the address pointed at by the double word on the top of the stack).
The int
instructions clear the trace (T) flag in the flags
register. They do not affect any other flags. The iret
instruction,
by its very nature, can affect all the flags since it pops the flags from
the stack.
6.9.4 The Conditional Jump Instructions
Although the jmp
, call
, and ret
instructions
provide transfer of control, they do not allow you to make any serious decisions.
The 80x86's conditional jump instructions handle this task. The conditional
jump instructions are the basic tool for creating loops and other conditionally
executable statements like the if..then
statement.
The conditional jumps test one or more flags in the flags register to see
if they match some particular pattern (just like the setcc
instructions). If the pattern matches, control transfers to the target location.
If the match fails, the CPU ignores the conditional jump and execution continues
with the next instruction. Some instructions, for example, test the conditions
of the sign, carry, overflow, and zero flags. For example, after the execution
of a shift left instruction, you could test the carry flag to determine
if it shifted a one out of the H.O. bit of its operand. Likewise, you could
test the condition of the zero flag after a test
instruction
to see if any specified bits were one. Most of the time, however, you will
probably execute a conditional jump after a cmp
instruction.
The cmp instruction sets the flags so that you can test for less than, greater
than, equality, etc.
Note: Intel's documentation defines various synonyms or instruction aliases
for many conditional jump instructions. The following tables list all the
aliases for a particular instruction. These tables also list out the opposite
branches. You'll soon see the purpose of the opposite branches.
Jcc Instructions That Test Flags Instruction | Description | Condition | Aliases | Opposite |
---|
JC | Jump if carry | Carry = 1 | JB, JNAE | JNC |
JNC | Jump
if no carry | Carry = 0 | JNB, JAE | JC |
JZ | Jump if zero | Zero
= 1 | JE | JNZ |
JNZ | Jump if not zero | Zero = 0 | JNE | JZ |
JS | Jump if sign | Sign = 1 | - | JNS |
JNS | Jump if no
sign | Sign = 0 | - | JS |
JO | Jump if overflow | Ovrflw=1 |
- | JNO |
JNO | Jump if no Ovrflw | Ovrflw=0 | - | JO |
JP | Jump
if parity | Parity = 1 | JPE | JNP |
JPE | Jump if parity even | Parity
= 1 | JP | JPO |
JNP | Jump if no parity | Parity = 0 | JPO | JP |
JPO | Jump if parity odd | Parity = 0 | JNP | JPE |
Jcc Instructions for Unsigned Comparisons
Instruction | Description | Condition | Aliases | Opposite |
---|
JA | Jump if above (>) | Carry=0, Zero=0 | JNBE | JNA |
JNBE | Jump if not below or equal (not <=) | Carry=0, Zero=0 | JA | JBE |
JAE | Jump if above or equal (>=) | Carry = 0 | JNC, JNB | JNAE |
JNB | Jump if not below (not <) | Carry = 0 | JNC, JAE | JB |
JB | Jump if below (<) | Carry = 1 | JC, JNAE | JNB |
JNAE | Jump if not above or equal (not >=) | Carry = 1 | JC, JB | JAE |
JBE | Jump if below or equal (<=) | Carry = 1 or
Zero = 1 | JNA | JNBE |
JNA | Jump if not above
(not >) | Carry = 1 or
Zero = 1 | JBE | JA |
JE | Jump if equal (=) | Zero = 1 | JZ | JNE |
JNE | Jump if not equal () | Zero = 0 | JNZ | JE |
Jcc Instructions for Signed Comparisons
Instruction | Description | Condition | Aliases | Opposite |
---|
JG | Jump if greater (>) | Sign = Ovrflw or Zero=0 | JNLE | JNG |
JNLE | Jump if not less than or equal (not <=) | Sign = Ovrflw or Zero=0
| JG | JLE |
JGE | Jump if greater than or equal (>=) | Sign = Ovrflw | JNL | JGE |
JNL | Jump if not less than (not <) | Sign = Ovrflw | JGE | JL |
JL | Jump if less than (<) | Sign Ovrflw | JNGE | JNL |
JNGE | Jump if not greater or equal (not >=) | Sign Ovrflw
| JL | JGE |
JLE | Jump if less than or equal (<=) | Sign Ovrflw or
Zero = 1 | JNG | JNLE |
JNG | Jump if not greater than (not >) | Sign Ovrflw or
Zero = 1 | JLE | JG |
JE | Jump if equal (=) | Zero = 1 | JZ | JNE |
JNE | Jump if not equal () | Zero = 0 | JNZ | JE |
On the 80286 and earlier, these instructions are all two bytes long. The
first byte is a one byte opcode followed by a one byte displacement. Although
this leads to very compact instructions, a single byte displacement only
allows a range of ±128 bytes. There is a simple trick you
can use to overcome this limitation on these earlier processors:
- Whatever jump you're using, switch to its opposite form. (given in the
tables above).
- Once you've selected the opposite branch, use it to jump over a
jmp
instruction whose target address is the original target address.
For example, to convert:
jc Target
to the long form, use the following sequence of instructions:
jnc SkipJmp
jmp Target
SkipJmp:
If the carry flag is clear (NC=no carry), then control transfers to label
SkipJmp
, at the same point you'd be if you were using the jc
instruction above. If the carry flag is set when encountering this sequence,
control will fall through the jnc
instruction to the jmp
instruction that will transfer control to Target
. Since the
jmp
instruction allows 16 bit displacement and far operands,
you can jump anywhere in the memory using this trick.
One brief comment about the "opposites" column is in order. As
mentioned above, when you need to manually extend a branch from ±128
you should choose the opposite branch to branch around a jump to the target
location. As you can see in the "aliases" column above, many conditional
jump instructions have aliases. This means that there will be aliases for
the opposite jumps as well. Do not use any aliases when extending branches
that are out of range. With only two exceptions, a very simple rule completely
describes how to generate an opposite branch:
- If the second letter of the
jcc
instruction is not an "n",
insert an "n" after the "j". E.g., je
becomes
jne
and jl
becomes jnl
.
- If the second letter of the
jcc
instruction is an "n",
then remove that "n" from the instruction. E.g., jng
becomes jg
, jne
becomes je
.
The two exceptions to this rule are jpe
(jump parity even)
and jpo
(jump parity odd). These exceptions cause few problems
because (a) you'll hardly ever need to test the parity flag, and (b) you
can use the aliases jp
and jnp
synonyms
for jpe
and jpo
. The "N/No N"
rule applies to jp
and jnp
.
Though you know that jge
is the opposite of jl
,
get in the habit of using jnl
rather than jge
.
It's too easy in an important situation to start thinking "greater
is the opposite of less" and substitute jg
instead. You
can avoid this confusion by always using the "N/No N" rule.
MASM 6.x and many other modern 80x86 assemblers will automatically convert
out of range branches to this sequence for you. There is an option that
will allow you to disable this feature. For performance critical code that
runs on 80286 and earlier processors, you may want to disable this feature
so you can fix the branches yourself. The reason is quite simple, this simple
fix always wipes out the pipeline no matter which condition is true since
the CPU jumps in either case. One thing nice about conditional jumps is
that you do not flush the pipeline or the prefetch queue if you do not take
the branch. If one condition is true far more often than the other, you
might want to use the conditional jump to transfer control to a jmp
nearby, so you can continue to fall through as before. For example, if you
have a je target
instruction and target
is out
of range, you could convert it to the following code:
je GotoTarget
.
.
.
GotoTarget: jmp Target
Although a branch to target now requires executing two jumps, this is much
more efficient than the standard conversion if the zero flag is normally
clear when executing the je
instruction.
The 80386 and later processor provide an extended form of the conditional
jump that is four bytes long, with the last two bytes containing a 16 bit
displacement. These conditional jumps can transfer control anywhere within
the current code segment. Therefore, there is no need to worry about manually
extending the range of the jump. If you've told MASM you're using an 80386
or later processor, it will automatically choose the two byte or four byte
form, as necessary. See Chapter Eight to learn how to tell MASM you're using
an 80386 or later processor.
The 80x86 conditional jump instruction give you the ability to split program
flow into one of two paths depending upon some logical condition. Suppose
you want to increment the ax
register if bx
is
or equal to cx
. You can accomplish this with the following
code:
cmp bx, cx
jne SkipStmts
inc ax
SkipStmts:
The trick is to use the opposite branch to skip over the instructions you
want to execute if the condition is true. Always use the "opposite
branch (N/no N)" rule given earlier to select the opposite branch.
You can make the same mistake choosing an opposite branch here as you could
when extending out of range jumps.
You can also use the conditional jump instructions to synthesize loops.
For example, the following code sequence reads a sequence of characters
from the user and stores each character in successive elements of an array
until the user presses the Enter key (carriage return):
mov di, 0
ReadLnLoop: mov ah, 0 ;INT 16h read key opcode.
int 16h
mov Input[di], al
inc di
cmp al, 0dh ;Carriage return ASCII code.
jne ReadLnLoop
mov Input[di-1],0 ;Replace carriage return with zero.
For more information concerning the use of the conditional jumps to synthesize
IF statements, loops, and other control structures, see Chapter Ten.
Like the setcc
instructions, the conditional jump instructions
come in two basic categories - those that test specific process flag values
(e.g., jz, jc, jno
) and those that test some condition ( less
than, greater than, etc.). When testing a condition, the conditional jump
instructions almost always follow a cmp
instruction. The cmp
instruction sets the flags so you can use a ja, jae, jb, jbe, je,
or jne
instruction to test for unsigned less than, less than
or equal, equality, inequality, greater than, or greater than or equal.
Simultaneously, the cmp instruction sets the flags so you can also do a
signed comparison using the jl, jle, je, jne, jg,
and jge
instructions.
The conditional jump instructions only test flags, they do not affect any
of the 80x86 flags.
6.9.5 The JCXZ/JECXZ Instructions
The jcxz
(jump if cx
is zero) instruction branches
to the target address if cx
contains zero. Although you can
use it anytime you need to see if cx contains zero, you would normally use
it before a loop you've constructed with the loop
instructions.
The loop
instruction can repeat a sequence of operations cx
times. If cx
equals zero, loop
will repeat the
operation 65,536 times. You can use jcxz
to skip over such
a loop when cx
is zero.
The jecxz
instruction, available only on 80386 and later processors,
does essentially the same job as jcxz
except it tests the full
ecx
register. Note that the jcxz
instruction only
checks cx
, even on an 80386 in 32 bit mode.
There are no "opposite" jcxz
or jecxz
instructions. Therefore, you cannot use "N/No N" rule to extend
the jcxz
and jecxz
instructions. The easiest way
to solve this problem is to break the instruction up into two instructions
that accomplish the same task:
jcxz Target
becomes
test cx, cx ;Sets the zero flag if cx=0
je Target
Now you can easily extend the je
instruction using the techniques
from the previous section.
The test
instruction above will set the zero flag if and only
if cx
contains zero. After all, if there are any non-zero bits
in cx
, logically anding them with themselves will produce a
non-zero result. This is an efficient way to see if a 16 or 32 bit register
contains zero. In fact, this two instruction sequence is faster than the
jcxz
instruction on the 80486 and later processors. Indeed,
Intel recommends the use of this sequence rather than the jcxz
instruction if you are concerned with speed. Of course, the jcxz
instruction is shorter than the two instruction sequence, but it is not
faster. This is a good example of an exception to the rule "shorter
is usually faster."
The jcxz
instruction does not affect any flags.
6.9.6 The LOOP Instruction
This instruction decrements the cx
register and then branches
to the target location if the cx
register does not contain
zero. Since this instruction decrements cx
then checks for
zero, if cx
originally contained zero, any loop you create
using the loop
instruction will repeat 65,536 times. If you
do not want to execute the loop when cx
contains zero, use
jcxz
to skip over the loop.
There is no "opposite" form of the loop instruction, and like
the jcxz/jecxz
instructions the range is limited to ±128
bytes on all processors. If you want to extend the range of this instruction,
you will need to break it down into discrete components:
; "loop lbl" becomes:
dec cx
jne lbl
You can easily extend this jne
to any distance.
There is no eloop
instruction that decrements ecx
and branches if not zero (there is a loope
instruction, but
it does something else entirely). The reason is quite simple. As of the
80386, Intel's designers stopped wholeheartedly supporting the loop
instruction. Oh, it's there to ensure compatibility with older code, but
it turns out that the dec/jne
instructions are actually faster
on the 32 bit processors. Problems in the decoding of the instruction and
the operation of the pipeline are responsible for this strange turn of events.
Although the loop
instruction's name suggests that you would
normally create loops with it, keep in mind that all it is really doing
is decrementing cx
and branching to the target address if cx
does not contain zero after the decrement. You can use this instruction
anywhere you want to decrement cx
and then check for a zero
result, not just when creating loops. Nonetheless, it is a very convenient
instruction to use if you simply want to repeat a sequence of instructions
some number of times. For example, the following loop initializes a 256
element array of bytes to the values 1, 2, 3, ...
mov ecx, 255
ArrayLp: mov Array[ecx], cl
loop ArrayLp
mov Array[0], 0
The last instruction is necessary because the loop does not repeat when
cx
is zero. Therefore, the last element of the array that this
loop processes is Array[1]
, hence the last instruction.
The loop
instruction does not affect any flags.
6.9.7 The LOOPE/LOOPZ Instruction
Loope/loopz
(loop while equal/zero, they are synonyms for one
another) will branch to the target address if cx
is not zero
and the zero flag is set. This instruction is quite useful after cmp
or cmps
instruction, and is marginally faster than the comparable
80386/486 instructions if you use all the features of this instruction.
However, this instruction plays havoc with the pipeline and superscalar
operation of the Pentium so you're probably better off sticking with discrete
instructions rather than using this instruction. This instruction does the
following:
cx := cx - 1
if ZeroFlag = 1 and cx 0, goto target
The loope
instruction falls through on one of two conditions.
Either the zero flag is clear or the instruction decremented cx
to zero. By testing the zero flag after the loop instruction (with a je
or jne
instruction, for example), you can determine the cause
of termination.
This instruction is useful if you need to repeat a loop while some value
is equal to another, but there is a maximum number of iterations you want
to allow. For example, the following loop scans through an array looking
for the first non-zero byte, but it does not scan beyond the end of the
array:
mov cx, 16 ;Max 16 array elements.
mov bx, -1 ;Index into the array (note next inc).
SearchLp: inc bx ;Move on to next array element.
cmp Array[bx], 0 ;See if this element is zero.
loope SearchLp ;Repeat if it is.
je AllZero ;Jump if all elements were zero.
Note that this instruction is not the opposite of loopnz/loopne
.
If you need to extend this jump beyond ±128 bytes, you will
need to synthesize this instruction using discrete instructions. For example,
if loope
target is out of range, you would need to use an instruction
sequence like the following:
jne quit
dec cx
je Quit2
jmp Target
quit: dec cx ;loope decrements cx, even if ZF=0.
quit2:
The loope/loopz
instruction does not affect any flags.
6.9.8 The LOOPNE/LOOPNZ Instruction
This instruction is just like the loope
/loopz
instruction
in the previous section except loopne/loopnz
(loop while not equal/not zero) repeats while cx
is
not zero and the zero flag is clear. The algorithm is
cx := cx - 1
if ZeroFlag = 0 and cx 0, goto target
You can determine if the loopne
instruction terminated because
cx
was zero or if the zero flag was set by testing the zero
flag immediately after the loopne
instruction. If the zero
flag is clear at that point, the loopne
instruction fell through
because it decremented cx
to zero. Otherwise it fell through
because the zero flag was set.
This instruction is not the opposite of loope/loopz
. If the
target address is out of range, you will need to use an instruction sequence
like the following:
je quit
dec cx
je Quit2
jmp Target
quit: dec cx ;loopne decrements cx, even if ZF=1.
quit2:
You can use the loopne
instruction to repeat some maximum number
of times while waiting for some other condition to be true. For example,
you could scan through an array until you exhaust the number of array elements
or until you find a certain byte using a loop like the following:
mov cx, 16 ;Maximum # of array elements.
mov bx, -1 ;Index into array.
LoopWhlNot0: inc bx ;Move on to next array element.
cmp Array[bx],0 ;Does this element contain zero?
loopne LoopWhlNot0 ;Quit if it does, or more than 16 bytes.
Although the loope/loopz
and loopne/loopnz
instructions
are slower than the individual instruction from which they could be synthesized,
there is one main use for these instruction forms where speed is rarely
important; indeed, being faster would make them less useful - timeout loops
during I/O operations. Suppose bit #7 of input port 379h contains a one
if the device is busy and contains a zero if the device is not busy. If
you want to output data to the port, you could use code like the following:
mov dx, 379h
WaitNotBusy: in al, dx ;Get port
test al, 80h ;See if bit #7 is one
jne WaitNotBusy ;Wait for "not busy"
The only problem with this loop is that it is conceivable that it would
loop forever. In a real system, a cable could come unplugged, someone could
shut off the peripheral device, and any number of other things could go
wrong that would hang up the system. Robust programs usually apply a timeout
to a loop like this. If the device fails to become busy within some specified
amount of time, then the loop exits and raises an error condition. The following
code will accomplish this:
mov dx, 379h ;Input port address
mov cx, 0 ;Loop 65,536 times and then quit.
WaitNotBusy: in al, dx ;Get data at port.
test al, 80h ;See if busy
loopne WaitNotBusy ;Repeat if busy and no time out.
jne TimedOut ;Branch if CX=0 because we timed out.
You could use the loope/loopz
instruction if the bit were zero
rather than one.
The loopne/loopnz
instruction does not affect any flags.
6.10 Miscellaneous Instructions
There are various miscellaneous instructions on the 80x86 that don't fall
into any category above. Generally these are instructions that manipulate
individual flags, provide special processor services, or handle privileged
mode operations.
There are several instructions that directly manipulate flags in the 80x86
flags register. They are
clc
Clears the carry flag
stc
Sets the carry flag
cmc
Complements the carry flag
cld
Clears the direction flag
std
Sets the direction flag
cli
Clears the interrupt enable/disable flag
sti
Sets the interrupt enable/disable flag
Note: you should be careful when using the cli
instruction
in your programs. Improper use could lock up your machine until you cycle
the power.
The nop
instruction doesn't do anything except waste a few
processor cycles and take up a byte of memory. Programmers often use it
as a place holder or a debugging aid. As it turns out, this isn't a unique
instruction, it's just a synonym for the xchg ax, ax
instruction.
The hlt
instruction halts the processor until a reset, non-maskable
interrupt, or other interrupt (assuming interrupts are enabled) comes along.
Generally, you shouldn't use this instruction on the IBM PC unless you really
know what you are doing. This instruction is not equivalent to the x86 halt
instruction. Do not use it to stop your programs.
The 80x86 provides another prefix instruction, lock
, that,
like the rep
instruction, affects the following instruction.
However, this instruction has little meaning on most PC systems. Its purpose
is to coordinate systems that have multiple CPUs. As systems become available
with multiple processors, this prefix may finally become valuable. You need
not be too concerned about this here.
The Pentium provides two additional instructions of interest to real-mode
DOS programmers. These instructions are cpuid
and rdtsc
.
If you load eax
with zero and execute the cpuid
instruction, the Pentium (and later processors) will return the maximum
value cpuid
allows as a parameter in eax
. For
the Pentium, this value is one. If you load the eax
register
with one and execute the cpuid
instruction, the Pentium will
return CPU identification information in eax
. Since this instruction
is of little value until Intel produces several additional chips in the
family, there is no need to consider it further, here.
The second Pentium instruction of interest is the rdtsc
(read
time stamp counter) instruction. The Pentium maintains a 64 bit counter
that counts clock cycles starting at reset. The rdtsc
instruction
copies the current counter value into the edx:eax
register
pair. You can use this instruction to accurately time sequences of code.
Besides the instructions presented thus far, the 80286 and later processors
provide a set of protected mode instructions. This text will not consider
those protected most instructions that are useful only to those who are
writing operating systems. You would not even use these instructions in
your applications when running under a protected mode operating system like
Windows, UNIX, or OS/2. These instructions are reserved for the individuals
who write such operating systems and drivers for them.
- 6.9 - Program Flow Control
Instructions
- 6.9.1 - Unconditional Jumps
- 6.9.2 - The CALL and RET Instructions
- 6.9.3 - The INT, INTO, BOUND, and IRET
Instructions
- 6.9.4 - The Conditional Jump Instructions
- 6.9.5 - The JCXZ/JECXZ Instructions
- 6.9.6 - The LOOP Instruction
- 6.9.7 - The LOOPE/LOOPZ Instruction
- 6.9.8 - The LOOPNE/LOOPNZ Instruction
- 6.10 - Miscellaneous Instructions
Art of Assembly: Chapter Six - 26 SEP 1996
[Chapter Six][Previous]
[Next] [Art of
Assembly][Randall Hyde]